25. State-Value Functions
State-Value Functions
Note #1: The notation \mathbb{E}\pi[\cdot] is borrowed from the suggested textbook. \mathbb{E}\pi[\cdot] is defined as the expected value of a random variable, given that the agent follows policy \pi.
Note #2: In this course, we will use "return" and "discounted return" interchangably. For an arbitrary time step t, both terms refer to G_t \doteq R_{t+1} + \gamma R_{t+2} + \gamma^2 R_{t+3} + \ldots = \sum_{k=0}^\infty \gamma^k R_{t+k+1}, where \gamma \in [0,1]. In particular, when we refer to "return", it is not necessarily the case that \gamma = 1, and when we refer to "discounted return", it is not necessarily true that \gamma < 1. (This also holds for the readings in the recommended textbook.)